Data Visualization on CAI’s accomodation facilities
Author
Andrea Toaiari
Presenting the Dataset
At the following link (https://rifugi.cai.it/) it is available a catalog of all the accommodation facilities that are property of CAI (Alpine Italian Club). The gathering of data has begun about 7 years ago and produced a tab for each of the 722 structures that belong to the main body of the club or to the many sections deployed all over the country. Aside from geographical localization, elevation, type of structure and contact info, such tabs mention the presence or absence of basic services such as running water, electricity and so on.
The main purpose of this catalog is to offer a search engine that assist the final user with the choice of a suitable structure for their trip. One of the future goal of the project is to implement the database with new information, such as the trails that have to be taken in order to reach the different structures.
About this project
I think that this data can be interesting and useful for many reasons: first of all, understanding in which areas the club is present the most with its structures can be interesting for newcomers and for people that have just got in touch with the association. Of course, one can expect that the interested areas belong for the most to north Italy, being it the part of Italy with the highest number of mountain ranges. Still, regional-wise conclusions could be drawn with helps of visualization tools.
Secondly, exploiting information about the services and the location of the different structures can allow us to understand if particular services are constrained by the elevation and the “in-hospitality” of the area where a particular structure can be found. Moreover, data about the water management and energy efficiency can be visualized to check the modernity of the structure with respect to their age.
One more direction to explore could be the impact of tourism: if you ask to purists, they would tell you that mountains should not be environmentally impacted for the sake of tourists. Savage areas should be contaminated as little as possible. Data about the type of the structures can be used for this purpose.
Importing libraries
Importing data
dataset <-read.csv("data/dataset.csv", encoding ="UTF-8-BOM", na =c(""))as_tibble(dataset)
# A tibble: 750 × 98
id_cai geo_type geo_coordinates branch_id clustering_regione
<int> <chr> <chr> <int> <int>
1 921200203 Point [7.849818, 45.899672] 9212002 2
2 921601501 Point [7.8772939, 45.9519718] 9216015 1
3 921200201 Point [7.858085, 45.91364] 9212002 2
4 930010001 Point [7.8770155, 45.9270668] 9300100 1
5 921800104 Point [10.594504, 46.098845] 9218001 4
6 921605001 Point [10.547887, 46.496014] 9216050 4
7 921200202 Point [7.900306, 45.926354] 9212002 1
8 921200204 Point [7.8830236, 45.9124463] 9212002 1
9 921601502 Point [7.6692667, 45.9714956] 9216015 2
10 921605801 Point [10.54025, 46.38658] 9216058 3
# ℹ 740 more rows
# ℹ 93 more variables: clustering_provincia <int>, user_regione <int>,
# alias <chr>, branch <chr>, buildingRegulation_certification <int>,
# cityPlanRegulation_certification <int>, class_certification <chr>,
# code_certification <chr>, mainBody_certification <chr>,
# rebuildYear_certification <int>, category <chr>, fixedPhone_contact <chr>,
# role_contact <chr>, recycling_certification <int>, …
Cleaning and pre-processing data
At first sight, the number of columns is very high. Moreover, columns name use Italian language. First of all, let’s translate the names.
id_cai: unique code used to identify a structure within the CAI catalog.
status: the status of the structure. It can be active, temporary inactive or not available anymore.
alias: name used to refer to the structure.
buildingRegulation_certification, cityPlanRegulation_certification, class_certification, code_certification: these columns contains data referring to building permits and bureaucratic concepts. Not much of interest for what we are doing here.
buildYear_certification, rebuildYear_certification: refer to the year in which the structure was built and, possibly, the year it was rebuilt.
category: whitin CAI system, structures are classified in 5 categories, taking into consideration the difficulty of reaching the structure and supplying modality . Structures labelled A are reachable by clients by private car or they are located 10-minute walk max. from a parking spot. Those which are labelled B are reachable by cable car or they are in the surrounding of a cable car arriving spot. Structures that are labelled C, D, E are reachable only by walk and they differ in how they are supplied (motorized vehicle, cable-way, helicopter).
type: the type of the structure refers to another classification: Attended hut that are accommodation facilities that have arisen to meet mountaineering and hiking needs that are managed or maintained and open to the public seasonally, conveniently arranged and organized to provide hospitality and opportunities for stopping, refreshment, overnight accommodation and related services. Unattended hut, same as Attended hut but there is no one in charge of managing the structure. Bivouac, defined as mostly prefabricated, modestly sized studios with a capacity normally not exceeding 15 places. They are unattended and permanently open structures equipped with what is essential for the makeshift shelter of climbers. Social hut, they are available exclusively by a Branch as owner or by anyone with rights and permission of use. Equipped with simple equipment. It is generally locked with keys available from the Branch. It is considered as the Summer Social Headquarters of a Section and can be used for membership stays or intersectional meetings. Foot-hold that are fixed structures generally obtained by modest restoration and recovery of existing buildings typical of the mountain environment and located in an intermediate position between the valley floor and the alpine huts, they should provide shelter for mountaineers and hikers, with simple but essential equipment for overnight stay, with possible provision of cooking and heating equipment. Emergency recovery that re unattended and permanently open structures without any equipment. Used as an emergency stop..
geo_type, geo_coordinates: the pair of number represent a geo point with Coordinate Reference System as EPSG 4326.
authorityJurisdiction_geo, regional_commission_geo, owner, branch, branch_id, ownerRegion_geo, fixedPhone_property, email_property, webSite_property: the first and the second columns refer to authorities that are in charge in that area. The other columns give information about the CAI branch that owns the structure, along with contacts.
role_contact, fixedPhone_contact, emailAddress: role and phone number of the contact of the structure and reference email address.
recycling_certification: whether the structure applies recycling rules to manage trash.
water_availability_certification, water_type_certification: respectively, the amount of available water and the mode of how it is retrieved.
powerGenerator_certification, photovoltaic_certification, heating_type_certification: the first two are flags that indicate if the structure uses, respectively, a power generator and photo-voltaic panels. The third column specifies how the heat is produced.
fireRegulation_certification: whether the structure is suitable to face fire emergencies.
drainType_certification: how the structure drains black waters.
hot_water_service, electricity_service, shower_service, mobile_operator_service, payment_pos_service, restaurant_service, number_of_seats_restaurant, bedsheets_selling_service, kitchen_access_service, external_water_access, charge_point_bedroom_service, charge_point_commonspace_service, disabled_accessibility_service, disabled_toilet_service, mbike_accessibility_service, car_accessibility, pet_allowed_service, wifi_service, families_accessibility_service, defibrillator_service, number_common_wc: services that are available in the structure
resupply_request, pickupKey_property, self_management_property: the first column refer to the obligation of resupplying the structure after using it. Second and third columns refer to the possibility for the user to use the structure on its own.
asbestos_certification, oilSeparator_certification: respectively, whether the structures is certified for asbestos absence and whether the structure owns tools to separate oil from water.
summer_resupplying_mode, winter_resupplying_mode: how the structure is resupplied in summer and in winter
number_of_beds, number_of_winter_beds, number_of_beds_management, total_number_of_beds: data about beds availability. Winter beds are intended as emergency beds to be used even when the structure is closed for winter.
the other columns are either useless or it is difficult to understand their meaning.
Given that every column is a variable and we know that each row corresponds to a particular structure/hut, we can affirm that data is tidy.
Even among the mentioned columns, there are some that are useless for the scope of this project. For example, contacts info and certification regarding laws and bureaucracy. Let’s keep only meaningful columns.
During the realization of this project I encountered a problem with region_geo column. There is one record that has value friuli Venezia Giulia while the majority of records related to this region has value Friuli Venezia Giulia.
dataset <- dataset %>%mutate(region_geo =replace(region_geo, region_geo =="friuli Venezia Giulia", "Friuli Venezia Giulia"))
We also need to change some names of the region in order for them to be consistent to those in rnaturalearth.
Let’s now handle missing values. We can try to check column by column the amount of missing values and whether it is possible to retrieve the value in some particular cases.
# A tibble: 750 × 59
id_cai status alias buildYear_certificat…¹ rebuildYear_certific…² category
<int> <chr> <chr> <int> <int> <chr>
1 921200203 In at… "Cap… 1876 2018 D
2 921601501 In at… "Biv… NA NA E
3 921200201 In at… "Biv… 1918 1985 E
4 930010001 In at… "Cap… NA NA E
5 921800104 In at… "Biv… 1976 2020 E
6 921605001 In at… "Biv… 1971 2015 E
7 921200202 In at… "Cap… 1927 NA E
8 921200204 In at… "Cap… 1902 1998 E
9 921601502 In at… "Biv… NA NA E
10 921605801 In at… "Biv… 1968 NA E
# ℹ 740 more rows
# ℹ abbreviated names: ¹buildYear_certification, ²rebuildYear_certification
# ℹ 53 more variables: type <chr>, geo_type <chr>, geo_coordinates <chr>,
# region_geo <chr>, province_geo <chr>, municipality_geo <chr>,
# locality_geo <chr>, valley_geo <chr>, massif_geo <chr>,
# protected_area_geo <chr>, altitude_geo <int>, owner <chr>, branch <chr>,
# recycling_certification <int>, water_availability_certification <chr>, …
Many missing data occurs in multiple columns. Generally those columns referring to geographical information, have almost always complete data, except for particular ones like valley_geo, protected_area_geo, that assess more specific details. However, there are also some occurrences of missing data for region_geo and province_geo. They can be fixed manually by retrieving data from other columns.
dataset[dataset$alias =="Bivacco Tre Confini", "region_geo"] <-"Liguria"dataset[dataset$alias =="Rifugio Città di Novara", "province_geo"] <-"VB"dataset[dataset$alias =="Pian della Rena", "province_geo"] <-"LI"dataset[dataset$alias =="Rifugio Enrico Faiani", "province_geo"] <-"TE"dataset[dataset$alias =="Rifugio Cerri di Sant'Amato", "province_geo"] <-"AV"
Columns buildYear_certification is also characterized by a high number of missing values. However, info in this column can be interesting to understand the evolution of CAI structures throughout the years. Too many missing values are not good to obtain a meaningful result.
id_cai alias
1 921601501 Bivacco Città di Gallarate
2 930010001 Capanna Margherita
3 921601502 Bivacco Bossi
4 921600501 Bivacco Laeng
5 921600502 Bivacco Giannantonj
I proceeded to manually search on the internet for the build year of these structures. We can update the dataset with the retrieved information. Since we have to set a particular value for each row of interest, it has to be done manually without the aid of functions such as replace_na.
Many records have missing values for electricity_service, while having photovoltaic_certification set to 1. By logic we can assume that a structure that has photo-voltaic panels, has also electricity.
Moreover, many missing data occurs in columns that refers to services or certifications. Generally, these columns are Boolean, meaning that the service (certification) is either offered or not. Let’s see what happens to the number of missing values for these columns if we remove the rows that refers to structure of type Bivouac.
Warning in geom_col(alpha = 0.5, binwidth = 0.1, position = "dodge"): Ignoring
unknown parameters: `binwidth`
Scale for fill is already present.
Adding another scale for fill, which will replace the existing scale.
The number of missing values is substantially reduced. What I want to say with this is that in structures such as bivouacs, thought to be simple and to be used in case of emergency, comfort services are almost always absent . So, even if for lots of Bivouac records, data about services and certifications are absent, we can think that they simply don’t dispose of those services. Moreover, the catalog is meant to be consulted via a user interface. In such interface, for a certain structure, only available services are mentioned. So we can assume with more confidence that is not a bad idea to substitute missing values with 0 (False) for such columns.
Let’s convert columns about services and certifications to Boolean.
In many columns, data are written in Italian language. Let’s translate in English language.
dataset$status <-recode(dataset$status, "In attività"="In business","Alienato"="Alienated","Rudere"="Ruin", "Immobile non agibile"="Property not habitable", "In ristrutturazione"="Under renovation", "Momentaneamente chiuso"="Temporarily closed", "Cessazione affido"="Terminated job")
dataset$summer_resupplying_mode <-recode(dataset$summer_resupplying_mode,"elicottero"="Helicopter", "a spalla"="Human", "mezzi di trasporto individuale a 4 ruote"="4-wheels vehicle", "mezzi di trasporto individuale a 2 ruote e quoad"="2-wheels vehicle / quad", "mulo cavalallo e similari"="Pack animals", "teleferica"="Ropeway", "funivia di carosello"="Cable car", "motocarriola"="Powered wheelbarrow")
dataset$winter_resupplying_mode <-recode(dataset$winter_resupplying_mode,"elicottero"="Helicopter", "a spalla"="Human", "mezzi di trasporto individuale a 4 ruote"="4-wheels vehicle", "mezzi di trasporto individuale a 2 ruote e quoad"="2-wheels vehicle / quad", "teleferica"="Ropeway", "funivia di carosello"="Cable car", "motocarriola"="Powered wheelbarrow")
Visualizations
How many CAI structures?
Let’s start by taking a look at how many CAI structures there are in each region. Regions with a small number of structures are collapsed in a unique category to make the visualization more meaningful.
# Counting the number of CAI structures per regionregion_count <- dataset %>%group_by(region_geo) %>%summarise(count=n())
viz <- region_count# Ordering regions by number of CAI structuresviz$region_geo <-factor(viz$region_geo, levels = viz$region_geo[order(viz$count, decreasing=FALSE)])# Filtering regions with less than 10 structures and collapse them into a groupthreshold <-10viz <- viz %>%count(region_geo =fct_collapse(region_geo, Other =unique(region_geo[count < threshold])), wt = count)# Plottingggplot(viz, aes(y = region_geo, x = n)) +geom_col(color ="white", fill ="#0095DB") +geom_text(aes(label = n), hjust =1.2, color ="white", fontface ="bold")+theme(plot.title =element_text(face ="bold", color ="#0095DB", size =18, hjust =0.5),text =element_text(color ="#404040"),axis.title =element_text(color ="#212326", size =14, hjust =0.5),axis.text =element_text(size =12),panel.background =element_rect(fill ="#212326"),panel.grid =element_line(color ="white")) +labs(title ="Number of CAI structures per Region", y ="Region", x ="# CAI structures")
How to read: this is bar plot that shows the number of CAI structures per region. The y-axis containts the different regions and the x-axis holds numerical scale. In Other group, we have the sum of regions with less than 10 CAI structures.
We observe that the top-3 regions with the most number of CAI structures are Lombardia, Piemonte and Trentino-Alto Adige. Such regions are located in the north of Italy where Alps and Dolomiti massifs dominates. The top region from the center of Italy is Abruzzo, particularly famous for Gran Sasso massif.
Let’s visualize this on a map. However, this time we plot it province-wise.
# Count the number of CAI structures per provinceprovince_count <- dataset %>%select(province_geo) %>%group_by(province_geo) %>%summarise(count=n())
# Retrieve the maps of italy. One at region level and one at province level.italy_regions <-ne_states(country ="italy", returnclass ="sf") %>%mutate(area =st_area(geometry)) %>%group_by(region) %>%summarise(area=set_units(sum(area), "km^2"))
although coordinates are longitude/latitude, st_union assumes that they are
planar
italy_provinces <-ne_states(country ="italy", returnclass ="sf") %>%mutate(area =st_area(geometry)) %>%select(iso_3166_2)# Modify the name of the provinces to make the coherent with those in our datasetitaly_provinces$iso_3166_2 <-sub("IT-", "", italy_provinces$iso_3166_2)# Joining the dataset of the provinces and our dataset to obtain retrieve both the information of the counting and the geographical information. italy_provinces <-left_join(italy_provinces, province_count, by =c("iso_3166_2"="province_geo"))# Plottingggplot() +geom_sf(data = italy_provinces, aes(fill = count), color ="black", linewidth =0.03) +geom_sf(data = italy_regions, color ="black", linewidth =1, alpha =0) +scale_fill_distiller(palette ="YlGn", direction =1, na.value ="grey", breaks =c(0, 10, 30, 50, 70)) +labs(title ='Number of CAI structures per Province', fill ="# CAI structures") +theme(plot.title=element_text(face ="bold",size =18,color ="#006837"),panel.grid =element_line(color ="#000000"),panel.background =element_rect(fill ="#404040"),legend.title =element_text(size =14,color ="#212326"))
In this graph we have the map of Italy and each province is colored to represent the number of CAI structures in it. Darker Green means lots of structure and Lime Green means less structures. Bold black lines are meant to keep the information of regional division.
These results shows us that, despite Lombardia and Piemonte being the regions with more CAI structures, we have a high concentration of CAI mountain huts in the province area of Trento (belonging to Trentino - Alto Adige) and Belluno (belonging to Veneto). Both of them are very famous because of Dolomiti massif. We can also notice how, in some of the mentioned regions, we have grey areas that corresponds to provinces that are completely located in plain. For what regards the center of Italy, the colored areas generally overlap to the long mountain range of Appennini.
Elevation of structures
Another important information that we have in this dataset is the elevation of the geographical points where structures are located. Let’s try to plot the density distribution of altitude_geo, for each region. Again, we collapse regions with very few structures (<10) in a unique group.
viz <- dataset %>%select(c(alias, region_geo, altitude_geo, protected_area_geo))# Again, collapse regions with less than 10 structure in a unique groupviz <- viz %>%left_join(region_count, by="region_geo") %>%mutate(region_category =ifelse(count < threshold, "Other", as.character(region_geo))) %>%select(-count)# Computing for each region the median of the atitudes of their structurestemp <- viz %>%group_by(region_category) %>%summarise(mean_altitude =median(altitude_geo))# Ordering the groups by the medianviz$region_category <-factor(viz$region_category, levels = temp$region_category[order(temp$mean_altitude, decreasing=TRUE)])# Plottingggplot(viz, aes(x = altitude_geo, y = region_category, fill =after_stat(x))) +geom_density_ridges_gradient(scale =1.6, rel_min_height =0.01, gradient_lwd =1,jittered_points =TRUE,position =position_points_jitter(width =0.05, height =0),point_shape ='|', point_size =3, point_alpha=1,alpha=0.7) +scale_x_continuous(limits =c(0, 5000)) +scale_fill_viridis(name ="Altitude [m]", option ="D", limits =c(0, 4500)) +theme_ridges(font_size =13, grid =TRUE) +theme(text =element_text(color ="#212326"),plot.title =element_text(size =18, hjust =0.5, face ="bold",color ="#990000"),axis.title =element_text(size =14),axis.text =element_text(size =12),panel.background =element_rect(fill ="#F4F4F4", colour="#404040")) +labs(title="Distribution of altitudes of CAI structure per region", x="Altitude [m]", y="Region")
Picking joint bandwidth of 179
In this plot we can check the distribution of the altitudes for each region. On x-axis we have a numerical scale ranging from 0 to 5000 and on y-axis we have the regions. Within a single region, each small black segment represent a structure of that region and the density graph represent the distribution of the altitudes. The fill aesthetic is map to the altitude once again. One could say it is redundant but I think that the addition of the color helps to better discriminate between different altitudes range and to compare different regions.
We can see how Valle d’Aosta is the region were structures at highest altitudes are found. This region is surrounded by the 4 biggest massifs of Italy: Monte Bianco, Cervino, Monte Rosa and Gran Paradiso. However, we observe that the CAI structure with highest altitude is actually located in Piemonte.
The highest CAI hut in Italy is the Capanna Margherita, in Piemonte on Alpi Pennine with an altitude of 4554 meters.
In general, Trentino-Alto Adige, Piemonte, Lombardia and Veneto present a similar behavior: the altitudes with the highest frequency ranges between 2000 and 2500. On the other hand, we can clearly see a different pattern for regions from center of Italy (Abruzzo, Emilia-Romagna and Toscana), where altitudes are concentrated below 2000 meters. In the Other group we mostly have regions from center-south of Italy where we have very few CAI structures with altitudes that are generally centered around 1000 meters. It’s interesting because in this category we also have three structures with very high altitudes. In fact, two of them do not belong to the area of Italy.
alias region_geo altitude_geo
1 Bivacco Musso Svizzera 3664
2 Osservatorio Pizzi Deneri Sicily 2850
3 Bivacco Vacca Savoia Francia 2670
Bivacco Musso and Bivacco Vacca are located, respectively, in Svizzera and in Savoia region of France, but they do belong to CAI! The other structure is located in Siciliy, more precisely in the area of Parco dell’Etna, and it is a volcanological observatory.
On the other hand, we can mention those observations where altitude_geo is very low.
alias altitude_geo region_geo
1 Rifugio Casetto 4 Emilia-Romagna
2 Casa Cadorna 118 Friuli-Venezia Giulia
3 Rifugio Premuda 80 Friuli-Venezia Giulia
4 Pian della Rena 10 Toscana
One can expect to not find any mountain huts with an altitude so low. In particular, Rifugio Casetto, is an old guard house built during the clean-up operation of the area surrounding river Reno. City hall of Argenta decided to grant the property to the local CAI section and now the hut is the fully-fledged lowest mountain hut of all Italy!
CAI structures in history
It can be interesting to observe the spread evolution of CAI structures throughout the years. To do so, we are going to exploit the column buildYear_certification, that holds the year in which CAI built or acquired (and adapted) a new structure with the goal of being used as starting point, intermediate point, arriving point or recovery point for climbs or as support point for social activities.
Keep in mind that CAI was founded the 12th August 1863 by Quintino Sella, the Minister of Finances of the Reign of Italy back in the days.
alias buildYear_certification branch
1 Rifugio Teodulo 1852 Torino
2 Rifugio Sella 1861 Biella
3 Baita Pianello 1680 Borgomanero
4 Ca' di Rossi 1700 Forlì
5 Bivacco Varnerin 1600 San Vito al Tagliamento
6 Malga Pavarì 1800 Salò
7 Casetta Ciccaglia 1844 Perugia
As we can notice these CAI structure have been built way before 1863. It seems that, for some records, the column buildYear_certification refers to the year in which the building itself was built. However, for some of this cases, the buildings were not built for CAI activities in first place. At a certain point in time, they were granted to the club and they started being used for new purposes. For example, the original Rifugio Teodulo, was built in 1852 and was the first mountain hut in the Alps. The property area was then sold to CAI in 1920 and the actual Rifugio Teodulo was actually built there from scratch. It would make more sense if, for these records with buildYear_certification lower than 1863, we correct by searching for the year in which the club started to own the structure and use it for it’s purposes. In general, this is the case for all the other records in the dataset where buildYear_certification is greater than 1863.
This bar plot is meant to show the evolution of the building activity of CAI over time. The range of Year values on x-axis is binned by intervals of 4 years.
We can affirm that the history of Italian Mountaineering coincides at all effects with the history of CAI. The evolution of mountaineering and the ever stronger interest in the high peaks of Alps over the course of ’800 highlighted the need of fixed bases to access the summits. The club answered to this necessity and, right after its foundation, started to build many alpine huts at the turn of 1800 and 1900. The Golden Age of Alpinism was marked by the first half of 1900s, a period of intense exploration and summit conquests. Notice how, in this first half, we have a sudden decline right before 1920 and around 1940: these two moments coincide with the two World Wars. Especially during the Second World War, alpine activity manifested itself in other ways: after 8 September 1943, hut keepers and CAI guides were active participants and co-operators of Resistance patriots in operations, and in connections from the plains to the mountains. Unfortunately, this made the shelters the target of acts of vandalism by Nazi-Fascists and stragglers and, by the end of the conflict, 81 stuctures belonging to 35 sections were totally destroyed and 156 more suffered partial damages. The spike that occurs around 1950 may be due to the great work of reconstruction and restoration of some of the structures. In general, the second half the last century, overlapping with the Post World War II Era, holds majority of data. This could coincide with the fact that the Alps became a playground for outdoor enthusiasts, boosting the tourism in that era and the necessity of more mountain huts to host visitors. However, if we look at this data from another point of view, we can observe something different.
The majority of structures is of type Attended Hut and Bivouac. Let’s filter by this two type.
viz <- dataset %>%filter(!is.na(buildYear_certification)) %>%filter(type %in%c("Bivouac", "Attended hut"))# Plottingggplot(viz, aes(x = buildYear_certification, fill = type)) +stat_bin(binwidth =4, color ="black") +labs(title ="CAI structures built throughout the years",x ="Year",y ="# Mountain Huts built",fill ="Type of structure") +scale_x_continuous(breaks =seq(1880, 2020, 20)) +scale_fill_manual(values =c("#69b3a2", "#ffcc00")) +theme(text =element_text(color ="#212326"),plot.title =element_text(size =18, hjust =0.5, face ="bold",color ="#B46A96"),axis.title =element_text(size =14),axis.text =element_text(size =12),panel.grid.major =element_line(color ="#DDDDDD"),panel.grid.minor =element_blank(),panel.background =element_rect(fill ="#F4F4F4",colour="#404040"),legend.position ="top")
This graph is very similar to the previous one but here two colors are used to differentiate between structures of kind Attended hut and Bivouac. The position of the columns here is stacked.
It’s generally true that the ever stronger interest of casual enjoyers in the Italian Mountains induced a new business of tourism and hosting structure in the areas, but this business was pursued mostly by private entities. The philosophy of CAI is the knowledge and the study of mountains and the defense and safeguard of their natural environment. The club has no interest in building hosting structures to make money out of tourists. By definition the mountain hut is an accommodation facility (not an hotel), at high altitude, that constitutes a public utility.
In fact, we can observe that the majority of CAI structures built in the second half of the last century are Bivouacs. The trend of building this kind of structure started around 1920. Bivouacs are regarded as essential shelters, structures that are oriented according to the minimalist concept of using only the space necessary for survival, in contrast to the classic shelters. This evolution is coherent with the CAI promotion of sustainable practices in the landscape of Italian Mountains.
Preparing Simple Features
Now, let’s process correctly information in columns geo_type and geo_coordinates in order to exploit simple features library and deal with spatial data.
# Two functions to retrieve the two coordinates from the string in geo_coordinatescustom_split1 <-function(arg1) { arg1 <- arg1 %>%str_sub(2, -2) arg1 <-strsplit(arg1, ", ")return(as.numeric(arg1[[1]][1]))}custom_split2 <-function(arg1) { arg1 <- arg1 %>%str_sub(2, -2) arg1 <-strsplit(arg1, ", ")return(as.numeric(arg1[[1]][2]))}geo_ds <- dataset %>%mutate(N =lapply(geo_coordinates, custom_split1), .before=8) %>%mutate(E =lapply(geo_coordinates, custom_split2), .before=9)
Now that we have a column for N and a column for E, we can use st_as_sf function to create a simple feature collection, where the geometry consists in the set of points that represent the geographical position of the structures. By analyzing the format of geo_coordinates column, consulting a document that lists the EPSG codes adopted by Italy (here) and checking on epsg.io, I discovered that the coordinates were produced using EPSG 6706 so I used it for crs argument in st_as_sf function.
Geometry set for 750 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6.756183 ymin: 37.70028 xmax: 40 ymax: 79
Geodetic CRS: +proj=longlat +ellps=GRS80 +no_defs +type=crs
First 5 geometries:
POINT (7.849818 45.89967)
POINT (7.877294 45.95197)
POINT (7.858085 45.91364)
POINT (7.877015 45.92707)
POINT (10.5945 46.09884)
An animation of CAI Structures in Italy
Initially, my idea was to produce an interactive plot with a slider to move through years. For each year, a map of the country was shown with dots of one color representing the CAI structures built before that year and dots of another color highlighting the structures built that very year. It was meant to visualize the geographical dislocation and to see how this evolved over time. I tried to use plotly package and exploit the unofficial aesthetic variable frame by mapping buildYear_certification to it. However, this produced an interactive plot in which, for each year, only structures built in that year were shown. I haven’t found a way to plot those that were built previously along with each frame. Then, I tried to use gganimate package to produce a non-interactive animation of the desired result but, given some clashes between this package and simple feature objects, I haven’t managed to obtain a result. Finally, I decided to use a “brute force” approach and manually generate each frame by using ggsave function. Then, from the obtained images I generated a .gif, using gifski package and I inserted it below.
The interactive slider would have increased the usability of this plot. For example, by hoovering on a point, it would have been possible to check the name and the basic info of a hut built in a certain year.
In general, we notice once again how the area of major interest for the club is the north of Italy, with its Alps and Dolomite massifs. However, we can also notice how, despite the lower frequency, CAI started to build huts on the more modest Appennini Massif right away after its foundation. It may be interesting to know that the oldest CAI hut that is still active is Capanna Gniffetti, on Monte Rosa. To be fair, the first hut ever built by CAI is Alpetto Recovery, built in 1866 to support ascension on Monviso summit and replaced in the 1998 by the actual Rifugio Alpetto.
For completeness, here’s the code I used to generate the images, save them and use them to generate the .gif. It is possible to re-run it to check the reproducibility of the result. Notice that I filtered out Bivacco Tre Confini because its coordinates are not coherent with the CRS of the coordinates of all the other structures.
viz <- geo_ds %>%filter(!is.na(buildYear_certification)) %>%filter(alias !="Bivacco Tre Confini")# Given a year, the function return a plot for that yearget_map <-function(y) { filtered_data <- viz %>%filter(buildYear_certification == y) cumulative_data <- viz %>%filter(buildYear_certification < y)ggplot() +geom_sf(data = italy_regions, fill ="#f7eb07", color ="grey", linewidth =0.5) +geom_sf(data = cumulative_data, mapping =aes(color="cumulative"), alpha =0.7, size =1.5) +geom_sf(data = filtered_data, mapping =aes(color="filtered"), size =4) +labs(title =paste("CAI structures in", as.character(y))) +scale_colour_manual(name ='Legend', values =c('cumulative'='#07c7f7','filtered'='#1b07f7'), labels =c('Built before current year','Built in current year')) +theme(text =element_text(color ="#212326"),panel.grid.major =element_line(color ="#DDDDDD"),plot.title =element_text(size =25, face ="bold"),legend.text =element_text(size =13),legend.title =element_text(size =15),panel.background =element_rect(fill ="#F4F4F4", colour="#404040"))}# A plot for each year is generated and images are savedy_list <- viz$buildYear_certification %>% unique %>% sortmy_maps <-paste0("./temp/m_", seq_along(y_list), ".png")for (i inseq_along(y_list)){get_map(y = y_list[i])ggsave(my_maps[i], width =8, height =8)}png_files <-list.files("./temp", pattern =".*png$", full.names =TRUE)# Images are orderer by date of creationdetails <-file.info(png_files)details = details[with(details, order(as.POSIXct(ctime))), ]files =rownames(details)# Generation and saving of the GIFgifski(files, gif_file ="./animation.gif", width =680, height =680, delay =0.5)
Categorization of structures
As explained in the first part, in this dataset we have two columns that categorize CAI structures in two different ways. type describes the kind of structure in terms of complexity and functionality. category assign a class based on two parameters: the ease of reaching the refuge of the hut by a hiker and the way in which it can be resupplyed. Basically, a class A structure is reachable by guest with their private far and a class E structure requires roughly 4 hours to be resupplied by cable-way or helicopter.
It can be interesting to visualize the relation between these two categorization and the altitude of the structures.
# Lookup list to rename some column for better readability in the interactive plotlookup <-c("Type"="type", "Altitude"="altitude_geo", "Reachability_class"="category")viz <- dataset %>%filter(!is.na(category)) %>%select(c(altitude_geo, category, type)) %>%rename(all_of(lookup))# Plottingp <-ggplot(viz, aes(x = Reachability_class, y = Altitude, color = Type)) +geom_jitter(alpha =0.7, width =0.35) +scale_color_brewer(palette ="Dark2") +labs(title ="Relations between categorization and altitude of structures",x ="Reachability class",y ="Altitude (m)",color ="Type of structure") +theme(text =element_text(color ="#212326"),plot.title =element_text(size =12,face ="bold"),panel.grid =element_line(color ="#DDDDDD"),panel.background =element_rect(fill ="#F4F4F4", color="#404040"))ggplotly(p, tooltip =c("Type", "Altitude", "Reachability_class"))
In this plot, the x-axis represent the Reachability class, while the y-axis represent the altitude. The color aesthetic is mapped to the type of structure. Each dot represent a structure. The plot is interactive and it is possible to control which Type to show by selecting it on the legend on the right. Hoovering on a point displays the combination of the three information showed in this plot.
We notice right away that, as we go from class A to class E altitudes of structures increase. Generally, high altitude areas are more difficult to reach and to connect even with classic mountain infrastructures as dirt roads and cable-ways. We can also see how almost all structures of category Bivouac are classified as E. This makes sense since bivouacs are meant to be unattended structures built in isolated areas as makeshift shelters. Attended hut is the classic building that comes into mind when one thinks about mountains. We find many structures of that kind in class A, but the majority is classified as C and D, meaning that they require, respectively, from 10 minutes to 2 hours walk and from 2 hours to 4 hours walk, and this is the general experience of average people that go hiking. Structures of category Social hut shows a similar behavior, since they are the external base of different CAI branches, where member of the branches gather for special occasions. Unattended huts follows the same behavior.
Services
CAI structures are meant to host mountain enjoyers and offer them a safe place to rest. It can be interesting to visualize what kind of services are available in these buildings.
First let’s visualize how different kind of structures behave in terms of offered services.
num_services <- dataset %>%select(c(hot_water_service, electricity_service, shower_service, payment_pos_service, restaurant_service, bedsheets_selling_service, kitchen_access_service, external_water_access, disabled_accessibility_service, disabled_toilet_service, mbike_accessibility_service, car_accessibility, pet_allowed_service, wifi_service, families_accessibility_service, defibrillator_service)) %>%rowSums()viz <- dataset %>%mutate(tot_services = num_services)# Plottingggplot(viz, aes(x = type, y = tot_services, fill = type)) +geom_violin(adjust =2) +scale_fill_brewer(palette ="Dark2") +theme(legend.position ="none",text =element_text(color ="#212326"),plot.title =element_text(size =18, hjust =0.5, face ="bold"),axis.title =element_text(size =14),axis.text =element_text(size =12),panel.grid.major =element_line(color ="#DDDDDD"),panel.grid.minor =element_blank(),panel.background =element_rect(fill ="#F4F4F4",colour="#404040")) +labs(title ="Relation between services and type of structure",x ="Type of structure",y ="# of services offered")
This is a violin plot. For each type of structure of y-axis we have the distribution of the number of services offered.
Consistently with previous considerations, bivouacs tend to be more essential. Therefore, most of the time they do not offer any particular comfort. On the other hand, structures of kind Attended hut are more prone to offer services and comfort to the visitors, since they have a different kind of functionality with respect to bivouacs. Foot-holds are very similar to bivouacs in terms of functionality and offered services.
Sometimes, bivouacs and foot-holds offer some services. Let’s visualize what are this simple services we can find in structures of these kinds.
lookup <-c("Hot Water"="hot_water_service","Electricity"="electricity_service","Shower"="shower_service","POS"="payment_pos_service","Restaurant"="restaurant_service","Bedsheets Selling"="bedsheets_selling_service","Kitchen available"="kitchen_access_service","External water source"="external_water_access","Disabled accessibility"="disabled_accessibility_service","Disabled toilet"="disabled_toilet_service","Mountain bike accessibility"="mbike_accessibility_service","Car accessibility"="car_accessibility","Pet allowed"="pet_allowed_service","Wifi service"="wifi_service","Suitable for families"="families_accessibility_service","Defibrillator"="defibrillator_service","Charge point"="charge_point")list_services <-c("hot_water_service","electricity_service","shower_service","payment_pos_service","restaurant_service","bedsheets_selling_service","kitchen_access_service","external_water_access","disabled_accessibility_service","disabled_toilet_service","mbike_accessibility_service","car_accessibility","pet_allowed_service","wifi_service","families_accessibility_service","defibrillator_service","charge_point_bedroom_service","charge_point_commonspace_service")services <- dataset %>%filter(type %in%c("Bivouac", "Foot-hold")) %>%select(all_of(list_services))services$charge_point <- services$charge_point_bedroom_service | services$charge_point_commonspace_servicepercentage_true <- services %>%rename(all_of(lookup)) %>%colMeans() *100percentage_true <-data.frame(service =names(percentage_true), percentage = percentage_true) %>%filter(percentage >2)percentage_true$service <-factor(percentage_true$service, levels = percentage_true$service[order(percentage_true$percentage, decreasing=FALSE)])ggplot(percentage_true, aes(y = service, x = percentage)) +geom_segment(aes(yend = service, xend =0), color ="skyblue", linewidth=0.9) +geom_point(color ="blue", size =3) +labs(title ="Popularity of services",subtitle = ("(Only bivouacs and footholds)"),y ="Service",x ="% of CAI structure with this service") +theme(text =element_text(color ="#212326"),plot.title =element_text(size =18, face ="bold",color ="#EF9001"),plot.subtitle =element_text(color ="#EF9001"),axis.title =element_text(size =14),axis.text =element_text(size =12),panel.grid.major =element_line(color ="#DDDDDD"),panel.grid.minor =element_blank(),panel.background =element_rect(fill ="#F4F4F4",colour="#404040"))
This is a lollipop plot. Along y-axis we have different services and x-axis hold the percentage values.
We can see that sometimes bivouacs have some external water source nearby, they may have enough electricity to supply a small light and they may have a stove where it is possible to cook some simple food.
What about the other structures?
services <- dataset %>%filter(!(type %in%c("Bivouac", "Foot-hold"))) %>%select(all_of(list_services))services$charge_point <- services$charge_point_bedroom_service | services$charge_point_commonspace_servicepercentage_true <- services %>%rename(all_of(lookup)) %>%colMeans() *100percentage_true <-data.frame(service =names(percentage_true), percentage = percentage_true) %>%filter(percentage >2)percentage_true$service <-factor(percentage_true$service, levels = percentage_true$service[order(percentage_true$percentage, decreasing=FALSE)])ggplot(percentage_true, aes(y = service, x = percentage)) +geom_segment(aes(yend = service, xend =0), color ="skyblue", linewidth=0.9) +geom_point(color ="blue", size =3) +labs(title ="Popularity of services",subtitle ="(Only attended huts, unattended huts and social huts)",y ="Service",x ="% of CAI structure with this service") +theme(text =element_text(color ="#212326"),plot.title =element_text(size =18, face ="bold",color ="#EF9001"),plot.subtitle =element_text(color ="#EF9001"),axis.title =element_text(size =14),axis.text =element_text(size =12),panel.grid.major =element_line(color ="#DDDDDD"),panel.grid.minor =element_blank(),panel.background =element_rect(fill ="#F4F4F4",colour="#404040"))
In general, for the other structures, Electricity is almost always offered, along with Restaurant service and the possibility to wash with a cold shower. Less frequent services include the suitability of the structures for disabled people and this is understandable, since mountain area are not ideal for people with mobility problems.
How water is retrieved?
Information about how water is retrieved is missing for many structures in the dataset, meaning that we don’t even know if water is available since one of the option is Absent. Many of the missing values come from the bivouacs and their lack of information about services and details how the structure, due once again to the fact that bivouacs are kept as essential as possible. It can be interesting to visualize how the water is retrieved for all those structures that belong to the other type and that a more meaningful impact on the surrounding environment.
This is a waffle plot. Each small square represent a structure and its color match with a specific modality of retrieving water.
We can see that the majority of structures obtain water by capturing from nearby natural sources. In general, water is retrieved with respect for nature and only in small percentage of case, aqueducts are built.
Conclusions
The purpose of this project was to explain the principles and the objectives of CAI through visualizations applied to one of their multiple activities: the construction and management of structure suited for mountain environments and for hosting alpine enjoyers that are caring and respectful for such pristine and sacred places. This is generally shown by the tendency of building more and more essential structures as the area gets higher in altitude, impacting it the least possible. Moreover, services offered by CAI facilities are essential and there is no room for luxuries and comfort, as one would expect from an hotel, for example. CAI aims to be present on all the territories of Italy and its effort intended to promote the respect of the mountain and a sustainable way of experiencing these wonders of nature.